ARMA/ARIMA/SARIMA Models

Modeling U.S. Total Coal Consumption Data

When first attempting to model time series data, the first model considered is an autoregressive or moving average model. These AR/MA models are used to understand better the data and forecast future values based on historical data recorded previously. Before using these models, the time series must be stationary, meaning the mean and variance do not change over time. The most often cause of a violation of stationarity is a trend, where values slowly increase over time. So first, to check whether the data is stationary, we can use an ACF (autocorrelation function) and Dickey-Fuller Test to test for stationarity. Below is the Dickey-Fuller Test and the ACF plot for the coal consumption data explored and visualized in previous sections.


Code
theme_set(theme_gray(base_size=12,base_family="Palatino"))
coal_df <- read.csv("data/coal_us_consumption.csv") |>
  group_by(period) |>
  summarise(Consumption = sum(consumption)) |>
  filter(!period %in% c('2000-Q1', '2000-Q2',  '2000-Q3', '2000-Q4'))

coal_df_ts <- ts(coal_df$Consumption, start = c(2001,1), end = c(2021,1), frequency = 4)

`Log Transformation` <- log(coal_df_ts)
`Remove Seasonality` <- diff(log(coal_df_ts), lag=4)
`First Diff After Remove Seasonality` <- diff(diff(log(coal_df_ts), lag=4))

a <- ggAcf(coal_df_ts,70) + ggtitle("Original Data")
b <- ggAcf(`Log Transformation`,70) + ggtitle("Log Transformation")
c <- ggAcf(`Remove Seasonality`,70) + ggtitle("Remove Seasonality")
d <- ggAcf(`First Diff After Remove Seasonality`,70) + ggtitle("First Diff After Remove Seasonality")

ACF Plots

Code
ggarrange(a, b, ncol = 1, nrow = 2)

Code
ggarrange(c, d, ncol = 1, nrow = 2)

The ACF plots show the differences among original data, log transformation, remove seasonality, and first difference with remove seasonality.

For white noise series and stationary series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise, and it’s not stationary either.

This ACF graph of detrended and differenced US total coal consumption data (the forth one from above) shows almost all the spikes are inside the blue bounds, which proves the detrended and differenced data is stationary and ready to be utilized in future analysis.


Stationarity Test & Seasonality Test

Stationarity

Code
adf.test(`First Diff After Remove Seasonality`)

    Augmented Dickey-Fuller Test

data:  First Diff After Remove Seasonality
Dickey-Fuller = -3.9404, Lag order = 4, p-value = 0.01703
alternative hypothesis: stationary

Stationarity test is less than 0.05, which means first order differencing and removing seasonality would make the time series dataset stationary. The dataset is no longer needed to be differenced or detrended.

Seasonality


Code
isSeasonal(`First Diff After Remove Seasonality`, test = "combined", freq = NA)
[1] FALSE

The isSeasonal function from seastests library shows our first-differenced and detrended data has no seasonality.

Now we are confident that our differenced and detrended data is ready for the model fitting.



Model Fitting

Code
log(coal_df_ts) |> diff() |>diff(lag=4) |> ggtsdisplay() #this is better

Code
#ggplotly(d) plotting the stationary data's ACF graph

The ACF and PACF graphs are showing that q = 0,1,2,3,4, d = 1, p = 0,1,2,3,4, Q = 0,1,2, P = 0,1,2,3, and D = 1

Code
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,data){
  
  #K=(p2+1)*(q2+1)*(P2+1)*(Q2+1)
  
  temp=c()
  d=1
  D=1
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*100),nrow=100)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P2)
      {
        for(Q in Q1:Q2)
        {
          if(p+d+q+P+D+Q<=9)
          {
            
            model<- Arima(data,order=c(p-1,d,q-1),seasonal=c(P-1,D,Q-1))
            ls[i,]= c(p-1,d,q-1,P-1,D,Q-1,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          }
          
        }
      }
    }
    
  }
  
  
  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}
Code
output <- SARIMA.c(p1 = 1, p2 = 5, q1 = 1, q2 = 5, P1 = 1, P2 = 4, Q1 = 1, Q2 = 3, data = coal_df_ts) |>
  drop_na()

minaic <- output[which.min(output$AIC), ]
minbic <- output[which.min(output$BIC), ]
Code
knitr::kable(minaic)
p d q P D Q AIC BIC AICc
8 0 1 0 2 1 1 2807.125 2816.448 2807.689
Code
knitr::kable(minbic)
p d q P D Q AIC BIC AICc
2 0 1 0 0 1 1 2810.769 2815.431 2810.934
Code
auto.arima(coal_df_ts)
Series: coal_df_ts 
ARIMA(0,1,0)(2,1,1)[4] 

Coefficients:
       sar1     sar2     sma1
      0.069  -0.3507  -0.7618
s.e.  0.143   0.1285   0.1291

sigma^2 = 5.598e+14:  log likelihood = -1399.56
AIC=2807.13   AICc=2807.69   BIC=2816.45


Model diagnostics

From model fitting, we generated two models, ARIMA(0,1,0) x (2,1,1) and ARIMA(0,1,0) x (0,1,1). auto.arima() generated ARIMA(0,1,0) x (2,1,1) as well. ARIMA(0,1,0) x (2,1,1) has the lowest AIC while ARIMA(0,1,0) x (0,1,1) has the lowest BIC. Now let’s do two model diagnoses to analyze the result and find the better model to do forecast later.

ARIMA(0,1,0) x (2,1,1)

Code
fit1 <- Arima(coal_df_ts, order=c(0,1,0), seasonal = list(order = c(2,1,1), period = 4))

Model Fitting Visual Results and Residuals

Code
set.seed(123)
model_output <- capture.output(sarima(coal_df_ts,0,1,0,2,1,1,4))

Code
coal_df_ts |>
  Arima(order=c(0,1,0), seasonal = list(order = c(2,1,1), period = 4)) |>
  residuals() |> ggtsdisplay()

Code
checkresiduals(fit1)


    Ljung-Box test

data:  Residuals from ARIMA(0,1,0)(2,1,1)[4]
Q* = 8.2062, df = 5, p-value = 0.1452

Model df: 3.   Total lags used: 8

The Ljung-Box test uses the following hypotheses:

H0: The residuals are independently distributed.

HA: The residuals are not independently distributed; they exhibit serial correlation.

Ideally, we would like to fail to reject the null hypothesis. That is, we would like to see the p-value of the test be greater than 0.05 because this means the residuals for our time series model are independent, which is often an assumption we make when creating a model.


There isn’t any significant spikes in the ACF, and the model fails the Ljung-Box test. The model can still be used for forecasting.

Model Fitting

Code
coal_df_ts |>
  Arima(order=c(0,1,0), seasonal = list(order=c(2,1,1), period=4))
Series: coal_df_ts 
ARIMA(0,1,0)(2,1,1)[4] 

Coefficients:
       sar1     sar2     sma1
      0.069  -0.3507  -0.7618
s.e.  0.143   0.1285   0.1291

sigma^2 = 5.598e+14:  log likelihood = -1399.56
AIC=2807.13   AICc=2807.69   BIC=2816.45

Model Output Diagnostics

Code
cat(model_output[25:56], model_output[length(model_output)], sep = "\n") 
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
       sar1     sar2     sma1
      0.069  -0.3507  -0.7618
s.e.  0.143   0.1285   0.1291

sigma^2 estimated as 5.377e+14:  log likelihood = -1399.56,  aic = 2807.13

$degrees_of_freedom
[1] 73

$ttable
     Estimate     SE t.value p.value
sar1   0.0690 0.1430  0.4823  0.6310
sar2  -0.3507 0.1285 -2.7300  0.0079
sma1  -0.7618 0.1291 -5.9024  0.0000

$AIC
[1] 36.93586

$AICc
[1] 36.94024

$BIC
[1] 37.05853

ARIMA(0,1,0) x (0,1,1)

Code
fit2 <- Arima(coal_df_ts, order=c(0,1,0), seasonal = list(order = c(0,1,1), period = 4))

Model Fitting Visual Results and Residuals

Code
set.seed(123)
model_output2 <- capture.output(sarima(coal_df_ts,0,1,0,0,1,1,4))

Code
coal_df_ts |>
  Arima(order=c(0,1,0), seasonal = list(order = c(0,1,1), period = 4)) |>
  residuals() |> ggtsdisplay()

Code
checkresiduals(fit2)


    Ljung-Box test

data:  Residuals from ARIMA(0,1,0)(0,1,1)[4]
Q* = 16.558, df = 7, p-value = 0.02048

Model df: 1.   Total lags used: 8

The Ljung-Box test uses the following hypotheses:

H0: The residuals are independently distributed.

HA: The residuals are not independently distributed; they exhibit serial correlation.

Ideally, we would like to fail to reject the null hypothesis. That is, we would like to see the p-value of the test be greater than 0.05 because this means the residuals for our time series model are independent, which is often an assumption we make when creating a model.


There are two significant spikes in the ACF, and the model fails the Ljung-Box test. The model can still be used for forecasting.

Model Fitting

Code
coal_df_ts |>
  Arima(order=c(0,1,0), seasonal = list(order = c(0,1,1), period = 4))
Series: coal_df_ts 
ARIMA(0,1,0)(0,1,1)[4] 

Coefficients:
         sma1
      -0.8690
s.e.   0.0877

sigma^2 = 6.029e+14:  log likelihood = -1403.38
AIC=2810.77   AICc=2810.93   BIC=2815.43

Model Output Diagnostics

Code
cat(model_output2[20:49], model_output2[length(model_output2)], sep = "\n")
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         sma1
      -0.8690
s.e.   0.0877

sigma^2 estimated as 5.949e+14:  log likelihood = -1403.38,  aic = 2810.77

$degrees_of_freedom
[1] 75

$ttable
     Estimate     SE t.value p.value
sma1   -0.869 0.0877 -9.9034       0

$AIC
[1] 36.98381

$AICc
[1] 36.98452

$BIC
[1] 37.04514


Model Selection

From model diagnostics above, ARIMA(0,1,0) x (2,1,1) is also the one auto.arima() produced, and it has less spikes in the ACF and PACF plots of its residuals. The model also has smaller sigma squared means it has smaller variance, which means the estimators with a smaller variance is more efficient.

Therefore, ARIMA(0,1,0) x (2,1,1) is the better model.


Forecasting

Let’s forecast for the next three years using the model we just selected.

forecast Method

Code
c <- fit1 |> forecast(h=12) |> autoplotly() + ggtitle("Quarterly Coal Consumption Three-Year-Forecasting") + xlab("Year") + ylab("Short Ton")
c |>
  layout(showlegend = F,
         xaxis = list(rangeslider = list(visible = T)))

sarima.for() Method

Code
sarima.for(coal_df_ts, 12, 0,1,0,2,1,1,4)

$pred
          Qtr1      Qtr2      Qtr3      Qtr4
2021           274828535 376893671 306317902
2022 319770967 282898241 373853941 294614305
2023 276483317 237626523 331810754 257541142
2024 244525100                              

$se
         Qtr1     Qtr2     Qtr3     Qtr4
2021          23187703 32792364 40162280
2022 46375406 55401734 63150819 70047839
2023 76324135 81363930 86109260 90606402
2024 94890651                           


Compare with benchmark method

Code
values <- append(coal_df$Consumption, rep(NA,11))
coal_ts_df <- data.frame(Year = seq(as.Date("2000/1/1"), by = "quarter", length.out = 92), Short.tons=values)

coal_ts_df$meanf <- append(rep(NA,80), meanf(coal_df_ts, h=12)$mean)
coal_ts_df$naive <- append(rep(NA,80), naive(coal_df_ts, h=12)$mean)
coal_ts_df$snaive <- append(rep(NA,80), snaive(coal_df_ts, h=12)$mean)
coal_ts_df$rwf <- append(rep(NA,80), rwf(coal_df_ts, h=12, drift=TRUE)$mean)
coal_ts_df$fit <- append(rep(NA,80), forecast(fit1, h=12)$mean)

p <- ggplot(coal_ts_df) + 
  geom_line(aes(x=Year, y = Short.tons)) + 
  geom_line(aes(x=Year, y = meanf, color = "Mean")) +
  geom_line(aes(x=Year, y = naive, color = "Naïve")) +
  geom_line(aes(x=Year, y = snaive, color = "SNaïve")) +
  geom_line(aes(x=Year, y = rwf, color = "Drift")) +
  geom_line(aes(x=Year, y = fit, color = "Model")) +
  ggtitle("Comparison of the Fitted Model and Benchmark Methods") +
  ylab("Short tons")
  
ggplotly(p)

Benchmark method is for data scientists to keep their sanity when building models, they set a baseline — a score that the model must outperform. Normally, the state-of-the-art is used as the baseline but for problems with no existing solutions yet, one should build their own baseline.

Average Method

This method simply takes the average (or “mean”) value of the entire historical data and use that to forecast future values. Very useful for data with small variance or whose value lies close to the mean.

Drift Method

Drift is the amount of change observed from the data. In this method, drift is set to be the average change seen in the whole historical data and uses that to forecast values in the future. Basically, this just means drawing a straight line using the first and last values and extend that line into the future. This method works well on data that follows a general trend over time.

Naïve Method

This method uses the most recent value as the forecasted value for the next time step. The assumption followed by this method is that its value tomorrow is equal to its value today.

Comparison Conclusion

The fitted model has outperformed the rest of the benchmark methods such us the mean method, naive methods, and drift method.

Modeling U.S. Total CO2 Emissions Data

When first attempting to model time series data, the first model considered is an autoregressive or moving average model. These AR/MA models are used to understand better the data and forecast future values based on historical data recorded previously. Before using these models, the time series must be stationary, meaning the mean and variance do not change over time. The most often cause of a violation of stationarity is a trend, where values slowly increase over time. So first, to check whether the data is stationary, we can use an ACF (autocorrelation function) and Dickey-Fuller Test to test for stationarity. Below is the Dickey-Fuller Test and the ACF plot for the CO2 emissions data explored and visualized in previous sections.


Code
theme_set(theme_gray(base_size=12,base_family="Palatino"))
emissions_df <- read.csv("data/df_total_monthly_CO2_emissions.csv", skip=1, row.names = 1)

emissions_ts <- ts(emissions_df$sum, start = c(1973,1), end = c(2022,12), frequency = 12)

`Log Transformation` <- log(emissions_ts)
`Remove Seasonality` <- diff(log(emissions_ts), lag=12)
`First Diff After Remove Seasonality` <- diff(diff(log(emissions_ts), lag=12))

a <- ggAcf(emissions_ts,70) + ggtitle("Original Data")
b <- ggAcf(`Log Transformation`,70) + ggtitle("Log Transformation")
c <- ggAcf(`Remove Seasonality`,70) + ggtitle("Remove Seasonality")
d <- ggAcf(`First Diff After Remove Seasonality`,70) + ggtitle("First Diff After Remove Seasonality")

ACF Plots

Code
ggarrange(a, b, ncol = 1, nrow = 2)

Code
ggarrange(c, d, ncol = 1, nrow = 2)

The ACF plots show the differences among original data, log transformation, remove seasonality, and first difference with remove seasonality.

For white noise series and stationary series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise, and it’s not stationary either.

This ACF graph of detrended and differenced US total coal consumption data (the forth one from above) shows almost all the spikes are inside the blue bounds, which proves the detrended and differenced data is stationary and ready to be utilized in future analysis.



Stationarity Test & Seasonality Test

Stationarity

Code
adf.test(`First Diff After Remove Seasonality`)
Warning in adf.test(`First Diff After Remove Seasonality`): p-value smaller than
printed p-value

    Augmented Dickey-Fuller Test

data:  First Diff After Remove Seasonality
Dickey-Fuller = -10.227, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary

Stationarity test is less than 0.05, which means first order differencing and removing seasonality would make the time series dataset stationary. The dataset is no longer needed to be differenced or detrended.

Seasonality


Code
isSeasonal(`First Diff After Remove Seasonality`, test = "combined", freq = NA)
[1] FALSE

The isSeasonal function from seastests library shows our first-differenced and detrended data has no seasonality.

Now we are confident that our differenced and detrended data is ready for the model fitting.



Model Fitting

Code
log(emissions_ts) |> diff() |> diff(lag=12) |> ggtsdisplay() #this is better

Code
#ggplotly(d) plotting the stationary data's ACF graph

The ACF and PACF graphs are showing that q = 0,1,2, d = 1,2, p = 0,1,2, Q = 0,1,2,3,4 P = 0,1,2,3,4, and D = 1,2

Code
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,d,D,data){
  
  #K=(p2+1)*(q2+1)*(P2+1)*(Q2+1)
  
  temp=c()
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*100),nrow=100)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P2)
      {
        for(Q in Q1:Q2)
        {
          if(p+d+q+P+D+Q<=9)
          {
            
            model<- Arima(data,order=c(p-1,d,q-1),seasonal=c(P-1,D,Q-1))
            ls[i,]= c(p-1,d,q-1,P-1,D,Q-1,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          }
          
        }
      }
    }
    
  }
  
  
  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}
Code
# q = 0,1,2, d = 1, p = 0,1,2, Q = 0,1,2,3,4 P = 0,1,2,3,4, and D = 1
output1 <- SARIMA.c(p1 = 1, p2 = 3, q1 = 1, q2 = 3, P1 = 1, P2 = 4, Q1 = 1, Q2 = 4, d=1, D=1, data = emissions_ts) |>
  drop_na()

minaic <- output[which.min(output1$AIC), ]
minbic <- output[which.min(output1$BIC), ]
Code
knitr::kable(minaic)
p d q P D Q AIC BIC AICc
27 1 1 1 0 1 1 2813.265 2822.588 2813.828
Code
knitr::kable(minbic)
p d q P D Q AIC BIC AICc
27 1 1 1 0 1 1 2813.265 2822.588 2813.828
Code
auto.arima(emissions_ts)
Series: emissions_ts 
ARIMA(1,1,1)(2,1,2)[12] 

Coefficients:
         ar1      ma1    sar1     sar2     sma1    sma2
      0.3995  -0.7933  0.5958  -0.2730  -1.3767  0.5187
s.e.  0.0806   0.0575  0.1204   0.0493   0.1205  0.1012

sigma^2 = 291.5:  log likelihood = -2504.01
AIC=5022.02   AICc=5022.22   BIC=5052.65


Model diagnostics

From model fitting, we generated 1 model, ARIMA(1,1,1) x (0,1,1). auto.arima() generated ARIMA(1,1,1) x (2,1,2). It looks like ARIMA(1,1,1) x (2,1,2) has the lowest AIC, BIC, and AICc. Now let’s do two model diagnoses to analyze the result and find the better model to do forecast later.

ARIMA(1,1,1) x (2,1,2)

Code
fit1 <- Arima(emissions_ts, order=c(1,1,1), seasonal = list(order = c(2,1,2), period = 12))

Model Fitting Visual Results and Residuals

Code
set.seed(123)
model_output <- capture.output(sarima(emissions_ts,1,1,1,2,1,2,12))

Code
emissions_ts |>
  Arima(order=c(1,1,1), seasonal = list(order = c(2,1,2), period = 12)) |>
  residuals() |> ggtsdisplay()

Code
checkresiduals(fit1)


    Ljung-Box test

data:  Residuals from ARIMA(1,1,1)(2,1,2)[12]
Q* = 26.72, df = 18, p-value = 0.0844

Model df: 6.   Total lags used: 24

The Ljung-Box test uses the following hypotheses:

H0: The residuals are independently distributed.

HA: The residuals are not independently distributed; they exhibit serial correlation.

Ideally, we would like to fail to reject the null hypothesis. That is, we would like to see the p-value of the test be greater than 0.05 because this means the residuals for our time series model are independent, which is often an assumption we make when creating a model.


There is one significant spike in the ACF, and the model fails the Ljung-Box test. The model can still be used for forecasting.

Model Fitting

Code
emissions_ts |>
  Arima(order=c(1,1,1), seasonal = list(order=c(2,1,2), period=12))
Series: emissions_ts 
ARIMA(1,1,1)(2,1,2)[12] 

Coefficients:
         ar1      ma1    sar1     sar2     sma1    sma2
      0.3995  -0.7933  0.5958  -0.2730  -1.3767  0.5187
s.e.  0.0806   0.0575  0.1204   0.0493   0.1205  0.1012

sigma^2 = 291.5:  log likelihood = -2504.01
AIC=5022.02   AICc=5022.22   BIC=5052.65

Model Output Diagnostics

Code
cat(model_output[52:86], model_output[length(model_output)], sep = "\n") 
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         ar1      ma1    sar1     sar2     sma1    sma2
      0.3995  -0.7933  0.5958  -0.2730  -1.3767  0.5187
s.e.  0.0806   0.0575  0.1204   0.0493   0.1205  0.1012

sigma^2 estimated as 288.5:  log likelihood = -2504.01,  aic = 5022.02

$degrees_of_freedom
[1] 581

$ttable
     Estimate     SE  t.value p.value
ar1    0.3995 0.0806   4.9561       0
ma1   -0.7933 0.0575 -13.7845       0
sar1   0.5958 0.1204   4.9484       0
sar2  -0.2730 0.0493  -5.5347       0
sma1  -1.3767 0.1205 -11.4227       0
sma2   0.5187 0.1012   5.1276       0

$AIC
[1] 8.555405

$AICc
[1] 8.555652

$BIC
[1] 8.607577
Code
summary(fit1)
Series: emissions_ts 
ARIMA(1,1,1)(2,1,2)[12] 

Coefficients:
         ar1      ma1    sar1     sar2     sma1    sma2
      0.3995  -0.7933  0.5958  -0.2730  -1.3767  0.5187
s.e.  0.0806   0.0575  0.1204   0.0493   0.1205  0.1012

sigma^2 = 291.5:  log likelihood = -2504.01
AIC=5022.02   AICc=5022.22   BIC=5052.65

Training set error measures:
                      ME     RMSE      MAE         MPE     MAPE      MASE
Training set -0.05728885 16.80051 12.80849 -0.06346395 2.182867 0.5794284
                     ACF1
Training set -0.008209425

ARIMA(1,1,1) x (0,1,1)

Code
fit2 <- Arima(emissions_ts, order=c(1,1,1), seasonal = list(order = c(0,1,1), period = 12))

Model Fitting Visual Results and Residuals

Code
set.seed(123)
model_output2 <- capture.output(sarima(emissions_ts,1,1,1,0,1,1,12))

Code
emissions_ts |>
  Arima(order=c(1,1,1), seasonal = list(order = c(0,1,1), period = 12)) |>
  residuals() |> ggtsdisplay()

Code
checkresiduals(fit2)


    Ljung-Box test

data:  Residuals from ARIMA(1,1,1)(0,1,1)[12]
Q* = 46.721, df = 21, p-value = 0.001024

Model df: 3.   Total lags used: 24

There is two significant spikes in the ACF, and the model didn’t fail the Ljung-Box test. The model can’t be used for forecasting.

Model Fitting

Code
emissions_ts |>
  Arima(order=c(1,1,1), seasonal = list(order = c(0,1,1), period = 12))
Series: emissions_ts 
ARIMA(1,1,1)(0,1,1)[12] 

Coefficients:
         ar1      ma1     sma1
      0.4287  -0.8131  -0.8291
s.e.  0.0727   0.0497   0.0218

sigma^2 = 307.2:  log likelihood = -2519.7
AIC=5047.41   AICc=5047.48   BIC=5064.91

Model Output Diagnostics

Code
cat(model_output2[35:66], model_output2[length(model_output2)], sep = "\n")
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         ar1      ma1     sma1
      0.4287  -0.8131  -0.8291
s.e.  0.0727   0.0497   0.0218

sigma^2 estimated as 305.6:  log likelihood = -2519.7,  aic = 5047.41

$degrees_of_freedom
[1] 584

$ttable
     Estimate     SE  t.value p.value
ar1    0.4287 0.0727   5.8949       0
ma1   -0.8131 0.0497 -16.3708       0
sma1  -0.8291 0.0218 -37.9917       0

$AIC
[1] 8.598649

$AICc
[1] 8.598719

$BIC
[1] 8.628462
Code
summary(fit2)
Series: emissions_ts 
ARIMA(1,1,1)(0,1,1)[12] 

Coefficients:
         ar1      ma1     sma1
      0.4287  -0.8131  -0.8291
s.e.  0.0727   0.0497   0.0218

sigma^2 = 307.2:  log likelihood = -2519.7
AIC=5047.41   AICc=5047.48   BIC=5064.91

Training set error measures:
                      ME     RMSE      MAE         MPE     MAPE      MASE
Training set -0.04531625 17.29221 12.97933 -0.06676721 2.213159 0.5871565
                    ACF1
Training set -0.00373604


Model Selection

From model diagnostics above, ARIMA(1,1,1) x (2,1,2) is also the one auto.arima() produced, and it has less spikes in the ACF and PACF plots of its residuals. The model also has smaller sigma squared means it has smaller variance, which means the estimators with a smaller variance is more efficient. Nontheless, all the training set error measures, especially RMSE, of first model are lower than the ones of the second model.

Therefore, ARIMA(1,1,1) x (2,1,2) is the better model.


Forecasting

Let’s forecast for the next three years using the model we just selected.

forecast Method

Code
a <- fit1 |> forecast(h=80) |> autoplotly() + ggtitle("Monthy CO2 Emissions Five-Year-Forecasting") + xlab("Year") + ylab("Million Metric Tons of Carbon Dioxide")
a %>%
  layout(showlegend = F, title='Time Series with Rangeslider',
         xaxis = list(rangeslider = list(visible = T)))

sarima.for() Method

Code
sarima.for(emissions_ts, 60, 1,1,1,2,1,2,12)

$pred
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
2023 618.7850 522.7436 523.7467 456.6446 484.6787 525.0852 592.4839 588.2336
2024 595.4889 510.3660 509.0163 439.2943 467.0804 511.1400 581.6272 578.6118
2025 581.2447 504.6408 496.4980 429.0742 455.7201 501.7074 571.3342 569.9160
2026 574.3190 499.8097 488.2619 422.9227 448.9569 495.0954 563.3664 562.5627
2027 569.2822 493.6951 481.9732 417.2486 443.2297 488.9319 556.6300 555.7564
          Sep      Oct      Nov      Dec
2023 518.0627 499.4276 511.9846 573.6319
2024 507.6640 490.8693 500.3659 555.9238
2025 496.9108 480.3016 490.3520 544.5836
2026 488.5437 471.5427 482.7586 537.8624
2027 481.6952 464.4100 476.1690 532.1546

$se
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
2023 16.98534 19.86316 21.27671 22.26454 23.09347 23.85051 24.56782 25.25844
2024 29.43616 30.53130 31.44341 32.27622 33.06763 33.83262 34.57761 35.30568
2025 38.73656 39.38616 40.02545 40.65479 41.27458 41.88521 42.48707 43.08052
2026 46.16912 46.85184 47.48856 48.10286 48.70392 49.29551 49.87923 50.45586
2027 53.80342 54.64802 55.39405 56.09763 56.77985 57.44901 58.10852 58.75985
          Sep      Oct      Nov      Dec
2023 25.92822 26.58017 27.21614 27.83744
2024 36.01857 36.71743 37.40316 38.07652
2025 43.66591 44.24356 44.81376 45.37679
2026 51.02584 51.58947 52.14699 52.69860
2027 59.40374 60.04060 60.67073 61.29436


Compare with benchmark method

Code
values <- append(emissions_df$sum, rep(NA,80))
emissions_ts_df <- data.frame(Year = seq(as.Date("1973/1/1"), by = "month", length.out = 680), Million.metric.tons=values)

emissions_ts_df$meanf <- append(rep(NA,600), meanf(emissions_ts, h=80)$mean)
emissions_ts_df$naive <- append(rep(NA,600), naive(emissions_ts, h=80)$mean)
emissions_ts_df$snaive <- append(rep(NA,600), snaive(emissions_ts, h=80)$mean)
emissions_ts_df$rwf <- append(rep(NA,600), rwf(emissions_ts, h=80, drift=TRUE)$mean)
emissions_ts_df$fit <- append(rep(NA,600), forecast(fit1, h=80)$mean)

p <- ggplot(emissions_ts_df) + 
  geom_line(aes(x=Year, y = Million.metric.tons)) + 
  geom_line(aes(x=Year, y = meanf, color = "Mean")) +
  geom_line(aes(x=Year, y = naive, color = "Naïve")) +
  geom_line(aes(x=Year, y = snaive, color = "SNaïve")) +
  geom_line(aes(x=Year, y = rwf, color = "Drift")) +
  geom_line(aes(x=Year, y = fit, color = "Model")) +
  ggtitle("Comparison of the Fitted Model and Benchmark Methods") +
  ylab("Million Metric Tons of Carbon Dioxide")
  
ggplotly(p)

Benchmark method is for data scientists to keep their sanity when building models, they set a baseline — a score that the model must outperform. Normally, the state-of-the-art is used as the baseline but for problems with no existing solutions yet, one should build their own baseline.

Average Method

This method simply takes the average (or “mean”) value of the entire historical data and use that to forecast future values. Very useful for data with small variance or whose value lies close to the mean.

Drift Method

Drift is the amount of change observed from the data. In this method, drift is set to be the average change seen in the whole historical data and uses that to forecast values in the future. Basically, this just means drawing a straight line using the first and last values and extend that line into the future. This method works well on data that follows a general trend over time.

Naïve Method

This method uses the most recent value as the forecasted value for the next time step. The assumption followed by this method is that its value tomorrow is equal to its value today.

Comparison Conclusion

The fitted model has outperformed the rest of the benchmark methods such us the mean method, naive methods, and drift method.